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(54) Cache coherence network for a multiprocessor data processing system 



(57) A cache coherence network for transferring 
coherence messages between processor caches in a 
multiprocessor data processing system is provided. The 
network includes a plurality of processor caches associ- 
ated with a plurality of processors, and a binary logic tree 
circuit which can separately adapt each branch of the 
tree from a broadcast configuration during low levels of 
coherence traffic to a ring configuration during high lev- 
els of coherence traffic. A cache snoop-in input receives 
coherence messages and a snoop-out output outputs, 
at the most, one coherence message per current cycle 
of the network timing. A forward signal on a forward out- 
put indicates that the associated cache is outputting a 



message on snoop-out during the current cycle. A cache 
outputs received messages in a queue on the snoop-out 
output after determining any response message based 
on the received message. The binary logic tree circuit 
has a plurality of binary nodes connected in a binary tree 
structure. Each branch node has a snoop-in, a snoop- 
out, and a forward connected to each of a next higher 
level node and two lower level nodes. A forward signal 
on a forward output indicates that the associated node 
is outputting a message on snoop-out to the higher node 
during the current cycle. Each branch ends with multiple 
connections to a cache at the cache's snoop-in input, 
snoop-out output, and forward output 
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Description 

The present invention relates in general to cache 
coherence networks for multiprocessor data processing 
systems. s 

A cache coherence network connects a plurality of 
caches to provide the transmission of coherence mes- 
sages between the caches, which allows the caches to 
maintain memory coherence. A snoopy cache coher- 
ence mechanism is widely used and well understood as io 
used in multiprocessor systems. Snoopy cache coher- 
ence in multiprocessor systems use a single bus as a 
data transmission media. The single bus allows mes- 
sages and data to be broadcast to all caches on the bus 
at the same time. A cache monitors (snoops on) the bus is 
and automatically invalidates data it holds when the 
address of a write operation seen on the bus matches 
the address the cache holds. 

A single bus cache coherence network becomes 
impractical in medium-to-large multiprocessor systems. 20 
As the number of processors in the system increases, a 
significant load is placed on the bus to drive the larger 
capacity, and the volume of traffic on the bus is substan- 
tially increased. Consequently, cycle time of the snoopy 
bus scales linearly with the number of caches attached 2s 
to the bus. At some point, the cycle time of the snoopy 
bus will become larger than the cycle time of the proces- 
sors themselves, resulting in a saturation of the bus. 
Combining this with the fixed throughput of one coher- 
ence message per cycle of the bus, the bus quickly sat- 30 
urates as the number of caches attached to the bus 
increases. Thus, there is a limit to the number of caches 
that can be maintained effectively on a single snoopy 
bus. What is needed is an interconnection network that 
can adapt under the heavy electrical loading and 35 
increased traffic conditions that may result in a large mul- 
tiprocessor system, thus, providing scalability to the sys- 
tem, ft would be further desirable to provide an 
interconnection network that acts logically like, and 
affords a broadcast capability like, the snoopy bus. 40 

It is the object of the present invention to provide an 
adaptive, scalable cache coherence network for a data 
processing system which acts like a snoopy bus and 
which provides broadcast capability. 

The foregoing objects are achieved as is now 45 
described. According to the present invention as 
claimed, a cache coherence network for transferring 
coherence messages between processor caches in a 
multiprocessor data processing system is provided. The 
network includes a plurality of processor caches associ- so 
ated with a plurality of processors, and a binary logic tree 
circuit which can separately adapt each branch of the 
tree from a broadcast configuration during low levels of 
coherence traffic to a ring configuration during high lev- 
els of coherence traffic. 55 

In at least a preferred embodiment each cache has 
a snoop-in input a snoop-out output, and a forward out- 
put wherein the snoop-in input receives coherence mes- 
sages and the snoop-out output outputs, at the most one 



coherence message per current cycle of the network tim- 
ing. A forward signal on a forward output indicates that 
the associated cache is outputting a message on the 
snoop-out during the current cycle. A cache generates 
coherence messages according to a coherency protocol, 
and, further, each cache stores messages received on 
the snoop-in input in a message queue and outputs mes- 
sages loaded in the queue on the snoop-out output after 
determining any response message based on the 
received message. 

The binary logic tree circuit has a plurality of binary 
nodes connected in a binary tree structure, starting at a 
top root node and having multiple branches formed of 
branch nodes positioned at multiple levels of a branch. 
Each branch node has a snoop-in, a snoop-out, and a 
forward output connected to each of a next higher level 
node and two lower level nodes, such that a branch node 
is connected to a higher node at a next higher level of 
the tree structure, and to a first lower node and second 
lower node at a next lower level of the tree structure A 
forward signal on a forward output indicates that the 
associated node is outputting a message on snoop-out 
to the higher node during the current cycle. Each branch 
ends with multiple connections to a cache at the cache's 
snoop-in input snoop-out output and forward output, 
wherein the cache forms a bottom level node. 

The invention will best be understood by reference 
to the following detailed description of an illustrative 
embodiment when read in conjunction with the accom- 
panying drawings, wherein: 

Rgure 1 depicts a block diagram of a cache coher- 
ence network; 

Rgure 2 shows a schematic diagram of a preferred 
embodiment of a cache coherence network; 

Rgure 3 shows a schematic diagram of the logic cir- 
cuit of a preferred embodiment of a network node; 

Rgures 4 - 7 are the four possible port connection 
configurations of the logic circuit of Rgure 3, as it is 
used in the embodiment of Rgure 2; 

Rgure 8 shows the connections and message 
transmission flow during a cycle of the cache coher- 
ence network, under conditions of a first example; 

Rgure 9 shows the connections and message 
transmission flow during a cycle of the cache coher- 
ence network, under conditions of a second exam- 
ple; 

Rgure 10 shows the connections and message 
transmission flow during a cycle of the cache coher- 
ence network, under conditions of a third example;. 

Rgure 1 1 shows a schematic diagram of a logic cir- 
cuit of a preferred emboc5 merit of a network node. 
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With reference now to the figures and in particular 
with reference to Figure 1 . there is depicted a block dia- 
gram of a cache coherence network. Network logic tree 
1 0 is connected to a plurality of processor/caches P<rPn- 
i. Each processor/cache Pj (P^ s Pj a P 0 ) represents 5 
a processor with an associated cache, although the 
physical implementation may not have the cache integral 
to the processor as shown by the blocks in Figure 1 . The 
processor caches are also connected through a sepa- 
rate data communications bus (not shown) for transfer- w 
ring data blocks of memory between the processors and 
the system's main memory. 

As seen in Figure 1 , each processor Po - Pn-1 has 
three connections to the network: snoop-out (SO). For- 
ward (F), snoop-Hi (SI). The F signal output from a proc- is 
essor is a single bit signal. The SO and SI signals are 
multi-bit signals earned over a multi-bit bus. The informa- 
tion flowing over the network from the SO and SI ports 
is referred to as coherence traffic and can be divided into 
two categories: coherence requests and coherence 20 
responses. The requests and responses are in the form 
of packetized messages which travel in the network as 
a single uninterrupted unit Coherence requests are ini- 
tiated by a cache in response to a main memory access 
by its processor. A coherence response typically is initi- 25 
ated by other caches responding to requests which they 
have received on their SI inputs. An example of a coher- 
ence request would be a message asking a cache to 
invalidate a block of data. For example, {tag id) DCache- 
block-f lush. An exampl e of a coherence response would 30 
be an acknowledge message indicating the data-block 
has been invalidated in the cache For example, Ack. (tag 
id). The coherence messages used in the cache coher- 
ence network of the present invention could take on 
many forms, including those well known and often used 35 
in current snoopy coherency schemes. 

The SO output is used for outputting a number of 
messages onto the network. The network is timed, so 
that a cache may output only one message during each 
cycle of the network timing. The cache may issue a new 40 
coherence request, or it may respond to a coherence 
request by generating a response, or it may simply pass 
on a request that it had received earfi er over its SI port 
When a cache uses its SO port to output a coherence 
message, it requests participation in the coherence traf- 46 
fic over the network by negating its F signal. When a 
cache is not requesting participation in the coherence 
traffic ft always asserts its F signal and outputs a 
negated signal on the SO port (i.e.. SO = 0). 

A cache always receives coherence requests or sc 
responses from other caches on its SI input A cache 
deletes a request it receives from the coherence traffic 
on the SI port, H it is one it had sent out earlier over the 
SO port to be issued to the other processors in the net- 
work. Suitable identification fields are placed within each si 
coherence message when it is sent out from an SO port, 
thus enabfing a receiving cache to identify the originating 
cache of the message. In this way, a cache is able to 
identify its own messages which it had sent out over the 



network at a previous cycle, and to delete the message. 
This message will be deleted regardless of whether the 
F signal is asserted at the time of receipt 

A cache maintains a queue of incoming requests on 
its SI port This queue (not shown) is necessary because 
over a given period of time the cache may be generating 
its own coherence messages faster than it can evaluate 
and/or rebroadcast the received messages. The cache 
will delete a message from the SI queue if the message's 
identification field shows it to be a message originating 
from that cache. 

In any cache coherence protocol which might be 
used with the preferred embodiment, the cache gener- 
ates a response message if a received message is rel- 
evant to its own contents and warrants a response. In 
addition, the cache may either forward a received 
request out onto the network over its SO port or ignore it 

In accordance with the present invention, if the 
cache had asserted the F signal when it received a par- 
ticular coherence request, the next processor in the net- 
work must also have received that request (as explained 
below). In that case, there is no need for the cache to 
forward the message to the next cache in the network. If 
the cache had negated the F signal at the time "it received 
the coherence request, and therefore had itself sourced 
a valid coherence message to its SO port simultane- 
ously, the cache had clipped the broadcast mechanism 
(as explained below) and must forward the received 
coherence request to the next cache in the network. 
What constitutes the "next" cache in the network may be 
logically different than the physical makeup of the com- 
puter system. The "next" cache or processor is deci- 
phered from the logic of the network logic tree 1 0. which 
is made up of the network nodes. In the preferred embod- 
iment as shown in Figure 2, it will be shown that, 
because of the logic circuitry, a "next* processor is the 
processor to the left of a given processor, and is labelled 
with a higher reference number (i.e. P1 > PO) . But 
because of the network connection at the root node of 
the tree, PO is the "next" processor after processor P7. 

Along with saving the incoming message in the SI 
queue, the receiving cache saves the current state of the 
F signal at the time it receives the queued message. 
Preferably, the F signal is saved with the message in the 
SI queue. To determine whether to forward a received 
message out onto the network, the cache will check the 
state of the F signal at the time that the coherence mes- 
sage was received, which was stored in the message 
queue at the same time as the message. 

Referring now to Figure 2, there is depicted a pre- 
ferred embodiment of an adaptable, scalable binary tree 
cache coherence network in a multiprocessor data 
processing system, according to the present invention. 
The network is comprised of eight processors and their 
associated caches, PO - P7, and the network nodes, 
NODE 1-7. Together they form a network by which the 
processors PO - P7 can efficiently pass coherence mes- 
sages to maintain coherent memory within their caches. 
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This network is able to adapt to varying volumes and binarytree. Similarly, the Sl 0 and SI i have arrows point- 
kinds of coherence messages being transmitted over the ing away from node 100 showing that they are outputs 
network. The binary tree structure of the transmission from node 1 00 and inputs (snoop-in) into their respective 
network has a cycle time which scales to the logarithm lower level nodes. Ports Fq, SOo. F 1( and SOt are shown 
of the number of caches (i.e., processors) connected to 5 with arrows pointing into node 1 00 to indicate that they 
the network. This enables the network of the present are outputs from the lower level nodes and inputs into 
invention to be scalable to medium-sized to large-sized node 1 00. 

multiprocessor systems. When there is Dght traffic on the The circuit of Figure 3 is combinational, and has no 

network, processors are able to broadcast coherence registers within it. The logic of the tree works as stipu- 

messages to other processors, providing quick and effi- 10 lated when all signals are valid and stable. However, the 

dent cache coherence mechanism. As coherence traffic processors and caches which use the tree are independ- 

increases, the network is able to adapt and pass mes- entry clocked circuits. In some system designs, it may 

sages in a ring-like manner to the next processor in the therefore be necessary to provide queues at the ports of 

network, in that configuration, the network bandwidth is the tree and design an appropriate handshaking mech- 

incr eased by allowing pipelining of coherence traffic. In 15 anism for communication between a cache and its tree 

fact, the throughput of coherent messages through the ports. The tree is clocked independently and works on 

network can be as high as the number of caches in the the entries in front of the SO and F queues at its leaf 

network. Also, the ring connections substantially reduce ends. (In fact, a separate F queue is not necessary, K an 

driving requirements. Moreover, the network is also able empty SO queue implies an asserted F signal.) The tree 

to adapt to varying degrees of increased traffic by seg- 20 forwards the data to caches over the SI ports. Addition- 

menting itself into broadcast sections and ring sections. ally, if delays through the tree are not acceptable (for the 

depending on the locality of increased traffic. required cycle time of the tree), the tree can be pipelined 

The network logic tree 1 0 (in Figure 1 ) is comprised by adding registers at appropriate levels of the tree, 
of a plurality of network nodes connected together in a It should be noted that although the circuit of Figure 
binary logic tree structure, and each of the processors 25 3 simply and efficiently provides the transmission con- 
of the multiprocessor system are connected at the leaves nections required for the present invention, it will be 
of the binary logic tree. In the preferred embodiment of appreciated by those skilled in the art that other circuit 
Figure 2, the network logic tree comprises root node configurations which provide the same input and output 
NODE1 at the top level of the tree and branch nodes connections to provide the same logical function could 
NODE2-7 formed along branches at lower levels of the 30 also be used in the present invention. For example. Fig- 
tree, ure 11 is a schematic diagram of a logic circuit which 

Each network node NODE1-7 is designed with an may be used as a network node in an alternative embod- 

identical logic circuit, that which is depicted in Figure 3, iment of the present inventioa Also, the logic of the for- 

according to a preferred embodiment of the present ward signals or the snoop-irtenoop-out signals coukj be 

invention. This circuit is the same circuit used in carry 35 inverted and the binary logic tree circuitry designed to 

look-ahead adder circuits. Therefore, the operation of operate on these inverted signals as will be appreciated 

this circuit is well understood and well known by those by those skilled in the art 

skilled in the art. The organization and operation of a The operation of the circuit in Figure 3 is predicated 

binary logic tree using the carry look-ahead circuit as the on the states of the forward signals F 0 and Therefore, 

universal link has been described in the prior art See, 40 there are four possible configurations under which the 

G.J. Upovski, "An Organization For Optical Linkages logic circuit operates. These four configurations are 

Between Integrated Circuits", NCC 1 977, which is incor- shown in Figures 4-7. 

porated herein by reference This paper describes the Figure 4 diagrams the connections between ports 
use of a Carry Look-ahead circuit in a binary logic tree in node 1 00, when both forward signals from the lower 
to configure a broadcast or propagating link optical com- as level nodes are not asserted (Le. Fo = f\ - 0). Because 
munication network. both nodes have negated their forward signals, the lower 
Network node 1 00 has three connections to a higher level nodes will be outputting coherence messages over 
level node in the tree: SO, F, and SI; and six connections their SO ports. SO 0 will be transmitted to S1 1 through tog- 
to two lower level nodes in the tree: SOr> Fo. and Sl 0 teal OR-gate ORI The negated forward signals with turn 
connected to a first lower level node, and S0 1t F 1t and 50 off AND-gates AND1 , AND2 and AND3. This allows SO1 
Sh connected to a second lower level node. Each SO to pass through OR2 to SO. SI is directly connected to 
and SI port is labelled with a w to indicate that the port Sl 0 - 

accommodates w-bit-wide signals. Each of the F ports The second configuration of Rgure 3 will produce a 
accommodates a 1 -bit-wide signal. connection of ports in node 1 00 as diagramed in Figure 
The SI port has an anew pointing into the node 100 55 5. In the second configuration, NodeO (the node con- 
to show that the node receives messages from the higher nected to the right branch of node 100 and not shown) 
level node on that port The SO and F ports have arrows is not transmitting (i.e., rt is forwarding) a coherence mes- 
pointing away from the node showing that these are out- sage to its next higher level node, in this case node 1 00. 
put signals from the node to a higher level node in the Therefore. nodeO has asserted its forward signal Fr> The 
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other node connected to node 1 00. nodel . is transmitting 
a message to the next higher node, node 100. and thus 
has negated its forward signal Fi. With F t = 0. AND3 
outputs F = 0. The asserted F 0 allows SI to transmit 
through AND1 into OR1 . Because, by definition with F 0 5 
asserted. SOo is not outputting any messages, only the 
output of AND1 is output at port SI 1 . Again, with Fi = 0. 
AND2 is closed and SO1 passes through OR2 to SO. 

Referring now to Rgure 6. there is diagramed a third 
configuration of the logic circuit of Rgure 3. In this situ- 10 
ation nodeO is transmitting a message over the network 
and nodel is not: F 0 = 0 and Fi = 1. F 0 closes AND3 to 
produce F = 0. Once again SI is directly connected to 
Slo. Because F 0 is negated, it is transmitting messages 
over SOo, which is directly connected to SU through is 
OR1 . The negated F 0 closes AND1 as an input into OR1 . 
The asserted Fi allows SO 0 to pass through AND2 into 
OR2. By def inition, an asserted F< indicates that no mes- 
sages are output on S0 1f and therefore, the output of 
AND2 passes through OR2 to SO. 20 

The fourth possible configuration of the logic circuit 
of Rgure 3 occurs when neither of the lower level nodes 
are transmitting messages to node 1 00. A diagram of the 
transmission connections for this configuration is shown 
in Rgure 7. Here, F 0 = Fi = 1 . These inputs generate F 25 
= 1 from AND3. SI is directly connected to Slo. F 0 is 
asserted, allowing SI to pass through AND1 and OR1 to 
S^. NodeO is not transmitting, so SO 0 does not pass 
through OR1 to SO1. Although S0 1 is connected through 
OR2 to SO, and SO 0 is connected through AND2 and 30 
OR2 to SO, those connections are not shown to simplify 
the diagram of Rgure 7 since neither node is transmit- 
ting any messages over their snoop-out port 

Referring again back to Rgure 2. root node NOD El 
is the top level node of the binary logic tree. The SO of 35 
NODE1 is Directly connected to the SI of NODEL The 
two branches of the binary logic tree extending down 
from the root node to the next level nodes NODE2, 
NODE3 are comprised of three busses for delivering sig- 
nals. As can be seen from Rgure 2, the connections of 40 
NODE1 to NODE2 are equivalent to the connections 
from node 1 00 to nodeO, as described with Rgure 3, and 
the connections of NO DEI to NODE3 are equivalent to 
the connections of node 1 00 to nodel , as described with 
Rgure 3. 45 

Rom each node NODE2, NODE3, the binary tree 
again branches into two connections to the lower level 
nodes from each node NODE2, and NODE3. Each of the 
higher level connections from N0DE4-N0DE7 are con- 
nected to their associated next higher level node's tower so 
level connections. The branch nodes N0DE4-N0DE7 in 
turn have two branch connections to the next lower level 
nodes, in this case, those nodes being the proces- 
sors/caches P0 - P7. Each processor P0- P7 having its 
SO, F, and SI connected to the lower level connections ss 
of the next higher level node (i.e. NODE4-NODE7). 

For three examples of how the cache coherence net- 
work of the present invention adapts to coherence traffic 
on the network, consider Rgures 8 - 10. For the first 



example, consider the extreme case where every cache 
on the network is attempting to transmit a coherence 
message onto the network. In this extreme case, every 
cache must receive every other cache's message and 
potentially might respond with another message for each 
received message. Such a scenario forces the cache 
coherence network into a ring-type network where each 
cache passes a received message on to the next cache 
in the network after determining any response of its own 
to the message. 

In the example of Rgure 8. it can be seen that all 
caches are negating their forwarding signals (F = 0), so 
that they may transmit a coherence message out onto 
the network Consequently, NODE4 - 7 will have negated 
forward inputs from the lower level nodes. Thus, the logic 
circuit of each node will create transmission connections 
equal to those shown in Rgure 4, as shown in Rgure 8. 
As can be seen from Rgure 4, NODE4 - 7 will also 
negate their forward signals, resulting in NODE2 and 
NODE3 being configured as Rgure 4. Last NODE1 also 
has two negated forward signal inputs, configuring 
NODE1 as Rgure 4. 

The dashed arrows shown in Rgure 8 indicate data 
flow within the network As can be seen, with every cache 
in the network outputting a message on the network dur- 
ing this current cycle, each cache only transmits to the 
next cache in the network. For example, P0 outputs its 
coherence message on its snoop-out (SO) . This arrives 
at NODE4 on its SO 0 port which is connected to its S1 1 
port, which delivers PO's coherence message to PI at 
its SI port P1 outputs its message on its SO port which 
arrives at SOi of NODE4. This is transmitted to the SO 
port of NODE4, and on to the node at the next higher 
ievel of the tree. In this case, the next higher node from 
NODE4 is NODE2. Here, the message arrives on the 
right branch leading to NODE2. NODE2 is configured to 
transfer this message back down the left branch to 
NODE5. In turn, NODE5 connects its SI port to the SI 
port of the right branch node at the next lower level from 
NODE5, in this case, that node being P2. It can be seen 
then, that the coherence message output from PI is 
transmitted through NODE4, up to NODE2, back down 
to NODE5, and then arriving at P2. 

By inspecting the transmission paths of the remain- 
der of the processors, it can be seen that each processor 
passes its coherence message on to only the next proc- 
essor in the network. Because that next processor is also 
transmitting a message onto the network, the message 
from the previous processor is necessarily clipped and 
is not sent on to any other processors in the system. This 
can be understood, with reference to Rgure 8, by notic- 
ing that data moves in one direction within the network. 
Because of the particular logic circuit used in the pre- 
ferred errAxxSment, data generally travels from the right- 
hand side of the network to the left-hand side before 
passing over the top of the tree to transmit to the remai ri- 
der of the tree on the right-hand side Thus, in the pre- 
ferred emborJment the next processor in the network is 
the next processor to the left 
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In Figure 8. the network has formed a ring-network. 
In this network, each processor passes a network mes- 
sage on to the next processor. The caches continue to 
pass the message along to the next cache in the ring 
with each cycle of the network until every cache in the 
network has received the message. 

Referring now to Figure 9, there is depicted a dia- 
gram of the data flow for a second example within a pre- 
ferred embodiment of the cache coherence network of 
the present invention. In this extreme example, only one 
processor in the network is attempting to transmit a 
coherence message over the network during the current 
cycle. Because no other messages are being sent over 
the network during the current cycle, the one processor 
transmitting over the network is able to broadcast its 
message to every other processor within this one cycle. 
Here, PI is transmitting a message, and, therefore, has 
negated its forward signal. Ail other caches, having not 
transmitted a message, have asserted their forward sig- 
nals. (P2 - P7 have F = 1) . Therefore, NODEsS -7 are 
configured as shown in Figure 7. Each of these nodes 
assert their forward signals. This results in NODE3 being 
configured as shown in Figure 7. NODE4 receives a 
negated forward signal from its left branch and an 
asserted forward signal from its right branch, coming 
from the processor nodes PI and P0. respectively. This 
places NODE4 in the configuration of Figure 5. NODE2 
receives an asserted forward signal from NODE5 and a 
negated forward signal from NODE4, configuring it as 
shown in Figure 6. Similarly, NODE1 receives an 
asserted forward signal from NODE3 on its left branch, 
and a negated forward signal from NODE2 on its right 
branch, resulting in a configuration as shown in Figure 6. 

Given this structure of the network connections dur- 
ing the current cycle, the clashed arrows in Figure 9 
describe the direction of coherence message transmis- 
sion from processor PI to the rest of the processors con- 
nected to the cache coherence network. The message 
output from P1 's SO port passes through NODE4 up into 
NODE2, where the message is transferred both back 
down the left branch from NODE2 to NODE5, and up 
through NODE2 and up along the right branch of 
NODE1. The message wrapping around from PI 
through NODE5 is then transferred back down both the 
left and right branches of NODE5 to processors P2 and 
P3. The message also is transmitted through SO of 
node2 along the right branch of NODE1 . This message 
is transferred back down the left branch of NODE1 to be 
broadcast back down the entire left-hand side of the 
binary logic tree so that P4 - P7 receive the message. 
The message is also transmitted up through the SO port 
of NODE1 , which wraps back down through the right- 
hand branch of NODE1 into NODE2, and again down 
the right-hand branch of NODE2 into NODE4, where the 
message is passed down both branches of NODE4 into 
POandPI. 

As can be seen from the above description of Figure 
9, the cache coherence network of the present invention 
was able to adapt itself to a broadcast network so that a 



single processor was able to broadcast the message to 
the entire network within one cycle of the cache coher- 
ence system The message spreads out along the 
branches of the tree to all processors to the left of the 

5 broadcasting processor that are within the broadcaster's 
half of the binary logic tree. When the broadcasted mes- 
sage reaches the root node, NODE1. the message is 
passed back down along the right-hand side of the 
broadcasting processor's half of the tree so that all proc- 

10 essors to the right of the broadcasting processor and its 
half of the tree receives the message. At the same time, 
the message is broadcast down from the root node to all 
processors in the entire other half of the binary logic tree. 
In the broadcast mode, the broadcasting processor will 

15 also receive its own message. It has been explained, the 
received message will contain an identification field 
which indicates to the broadcasting cache that the 
received message was its own, and thus, should be 
ignored. 

20 Referring now to Figure 1 0, there is depicted a third 
example of the connections and data transmission in a 
preferred embodiment of the cache coherence network 
of the present invention during a particular cycle of the 
network. This example shows how the present invention 

25 can adapt to provide a combination of the ring and broad- 
cast networks under conditions between the two 
extremes described in the examples of Figure 8 and Fig- 
ure 9. 

In this example, for the current cycle, processors PI , 

30 P2, P4. and P5 are transmitting coherence messages 
onto the network, as is indicated by their negated forward 
signals. Processors P0, P3, P6, and P7 are not transmit- 
ting onto the network during the current cycle, as is indi- 
cated by their asserted forward Signals. 

35 A 0-1 forward signal input into NODE4 configures it 
as Figure 5. A 1 -0 forward signal input into NODES con- 
figures it as Figure 6. A 0-0 forward signal input into 
NODE6 configures it as Figure 4. A 1-1 forward signal 
input into NODE7 configures it as Figure 7. The forward 

40 signals from both NODE4 and NODE5 are negated, con- 
figuring NODE2 as seen in Figure 4. The forward signal 
of NODE6 is negated and the forward signal of NODE7 
is asserted, configuring NODE3 as shown in Figure 6. 
The forward signals of NODE2 and NODE3 are both 

45 negated, configuring NODE1 as seen in Figure 4. 

Processor Pi's message will pass through N0DE4 
up the right branch of NODE2, down the left branch into 
NODE5, and down the right branch of NODES into proc- 
essor P2 Because processor P2 was also broadcasting 

so a message during this cycle, NODES could not be con- 
figured to allow both PI and P2's message to be trans- 
ferred down the leftbranch of NODE5 into P3. Thus, P1 's 
message is clipped at P2, and P2 must maintain P1 's 
coherence message in its SI queue to be retransmitted 

55 to the rest of the network on the next or a succeeding 
cycle. Also, Pi's message was not transmitted up 
through SO of NODE2 because another processor to the 
left of PI at a leaf of the NODE2 branch was also trans- 
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miffing, and, therefore, was given the connection to the 
snoop-out of NODE2. 

As can be seen from Figure 10, P2's message 
passed back down the left branch of NODE5 to P3 and 
up through SO of NODE5, through to the SO of NODE2 
to NODE1. At NODE1, the message was transmitted 
back down the left branch of NODE1 to NODE3. The 
message from P2 passes down the right branch of 
NODE3 and the right branch of NODE6 into P4 There 
the message is dipped because of P4's transmission. 
P4's transmission is also clipped by P5's transmission, 
and therefore is passed only to NODE6 and then back 
down to P5. 

P5s message passes up through NODE6 and then 
up along the right branch of NODE3, where it is sent both 
back down the left branch of NODE3, and up to the next 
higher level node, NODE1 . The message passing down 
the left branch of NODE3 is broadcast through NODE7 
into processors P6 and P7. The message sent through 
the snoop-out of NODE3 routes back up through 
NODE1 . and down along the right branch of NODE1 and 
NODE2 into NODE4. At NODE4, the message of P5 is 
broadcast to both PO and PI. 

When using the network of the present embodiment, 
there is a danger that a cache initiating a coherence mes- 
sage might assert its forward signal during a cycle that 
one of its own messages currently pending on the net- 
work is delivered back to it The problem arises in that 
now the message has been passed over to a subsequent 
processor once again, which will reforward it throughout 
the network, and that this could continue forever since 
the initiating cache may continue to assert its forward sig- 
nal. 

To correct for this danger of a continuously for- 
warded request additional logic can be added to the net- 
work. At the root level node in the network, the path that 
forwards data from the left haH of the tree to the right half 
of the tree would have a decrement er, and so would the 
path that goes in the opposite direction. Each coherence 
request sent out over the network would contain an addi- 
tional 2-bit value that is decremented every time the mes- 
sage traverses between the two-half trees. An additional 
bit in the request carries a flag stating that the request is 
valid or invalid. This flag bit is set to "true" by the initiating 
cache, while the queue value is set to 2. The flag is turned 
to invalid when the count is already 0 when the request 
reaches the root node. AH invalid requests received are 
to be discarded at once by a receiving cache. If a unitary 
coding of 2 is used, an easy implementation of the dec- 
rementers is a mere invertor. The right to left transfer 
negates one bit of the two bit count value, and the left to 
right transfer negates the other. The logical OR-ing of the 
2-count bits as they come into the root node generates 
the valid bit 

Another problem is that of detecting whether all 
caches have responded or deciding that no more caches 
would respond at a later time to a particular coherence 
message This problem arises because of the adaptabil- 
ity of the cache coherence network. As coherence traffic 



increases, the number of messages clipped increases, 
which necessarily delays the transmission of requests 
and responses by additional cycles This problem is best 
solved by means of a protocol and time-out mechanism 

5 that assumes an upper bound on the delay that each 
cache may introduce in the path of a message and of the 
corresponding response, assuming that each is clipped 
at every cycle, and that adding up these delays will pro- 
duce an upper bound on the time after which no 

w responses may be expected by any caches in the net- 
work. 

Although the present invention has been described 
in a scheme based on a binary tree, the present invention 
can easily be generalized to any M-ary trea It has been 

is shown in the literature that a modified binary tree can be 
imbedded in a hypercube. See. "Scalability Of A Binary 
Tree On A Hypercube", S R. Deshpande and R.M. Jen- 
evein, ICPP 1 986, incorporated herein by reference This 
technique can be applied to achieving snoopy protocol 

20 in a hypercube based multiprocessor system 

In summary, the cache coherence network of the 
present invention automatically adapts to the coherence 
traffic on the network to provide the most efficient trans- 
mission of coherence messages The network adapts to 

25 a broadcast network or a ring network, or any combina- 
tion in between, as a function of which caches attached 
to the network are attempting to transmit coherence traf- 
fic on the network. Thus, branches of the binary logic 
tree with light coherence traffic may be predominately 

30 configured in a broadcast configuration to allow coher- 
ence messages to be quickly delivered to each cache 
within that branch. Still other branches with heavy coher- 
ence traff ic will automatically adapt to this increased traf- 
fic and configure themselves predominately in a rung 

35 network. 

While the invention has been particularly shown and 
described with reference to a preferred embocSment. it 
will be understood by those skilled in the art that various 
changes in form and detail may be made therein without 

40 departing from the scope of the invention. 

Claims 

1 . A cache coherence network for transferring coher- 
45 ence messages between processor caches in a mul- 
tiprocessor data processing system, the network 
comprising: 

a plurality of processor caches associated 
with a plurafity of processors, each cache having a 
so snoop-in input, a snoop-out output and a forward 
output, wherein the snoop-in input is arranged to 
receive coherence messages and the snoopout is 
arranged to output at the most, one coherence mes- 
sage per current cycle of the network timing, and 
55 arranged so that a forward signal on the forward out- 
put indicates that the cache is outputting a message 
on the snoop-out output during the current cycle, 
wherein a cache is arranged to generate coherence 
messages according to a coherency protocol, and, 
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further, wherein each cache is arranged to store 
messages received on the snoop-in input in a mes- 
sage queue and to output messages loaded in the 
queue on the snoop-out output after determining 
any response message based on the received mes- 
sage; and 

a binary logic tree circuit having a plurality of 
binary nodes connected in a binary tree structure, 
starting at a top root node and having multiple 
branches formed of branch nodes positioned at mul- 
tiple levels of a branch, and each branch node hav- 
ing a snoop-in, a snoop-out, and a forward 
connected to each of a next higher level node and 
two lower level nodes, such that a branch node is 
connected to a higher node at a next higher level of 
the tree structure, and to a first lower node and sec- 
ond lower node at a next lower level of the tree struc- 
ture, and arranged so that a forward signal on a 
forward indicates that the associated node is output- 
ting a message on snoop-out to the higher node dur- 
ing the current cycle, and wherein each branch ends 
with multiple connections to a cache at the cache's 
snoop-in input snoop-out output, and forward out- 
put wherein the cache forms a bottom level node. 

2. A cache coherence network as claimed in Claim 1 , 
wherein a node is arranged to transmit a message 
received on the snoop-in from the higher node to the 
snoop-in of the first lower level node, to transmit a 
message received on the snoop-out of the first tower 
level node to the snoop-in of the second lower level 
node, and to transmit a message received on the 
snoop-out of the second lower level node to the 
snoop-out going to the higher level node, when the 
first and second lower nodes are transmitting coher- 
ency messages during the current cycle. 

3. A cache coherence network of as claimed in Claim 
1 or Claim 2. wherein a node is arranged to transmit 
a message received on the snoop-in from the higher 
node to the snoop-in of the first lower level node, and 
to transmit a message received on the snoop-out of 
the first lower level node to both the snoop-in of the 
second lower level node and the snoop-out going to 
the higher level node, when the first lower node is 
arranged to transmit a coherence message and the 
second lower node is not transmitting a coherency 
message during the curcent cycle. 

4. A cache coherence network as claimed in any pre- 
ceding claim, wherein a node is arranged to transmit 
a message received on the snoop-in from the higher 
node to the snoop-in of the first lower level node and 
the snoop-in of the second lower level node, when 
the first and second tower nodes are rrottrajismrtting 
coherency messages during the current cycle. 

5. A cache coherence network as claimed in any pre- 
ceding claim, wherein a node is arranged to transmit 
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a message received on the snoop-in from the higher 
node to both the snoop-in of the first lower level node 
and the snoop-in of the second lower level node, and 
to transmit a message received on the snoop-out of 
the second tower level node to the snoop-out going 
to the higher level node, when the first lower node is 
not transmitting a coherence message and the sec- 
ond lower node is transmitting a coherency mes- 
sage during the current cycle. 

10 

6. A cache coherence network as claimed in any pre- 
ceding claim, wherein the root node has the snoop- 
out to the higher node connected to the snoop-in 
from the higher node 

75 

7. A cache coherence network as claimed in any pre- 
ceding claim, wherein a cache is arranged to assert 
a forward signal on the forward output when the 
cache is not transmitting a coherence message on 

20 the snoop-out output and to negate the forward sig- 
nal on the forward output when the cache is trans- 
mitting a coherence message during the current 
cycle. 

» a A cache coherence network as claimed in any pre- 
ceding daim, wherein ail nodes of the binary logic 
tree circuit are carry look-ahead circuits. 
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